Skip to content

Vttablet schema tracking: Fix _vt.schema_version corruption#13045

Merged
mattlord merged 8 commits intovitessio:mainfrom
Shopify:fix-vt-schema_version-corruption
Jun 19, 2023
Merged

Vttablet schema tracking: Fix _vt.schema_version corruption#13045
mattlord merged 8 commits intovitessio:mainfrom
Shopify:fix-vt-schema_version-corruption

Conversation

@brendar
Copy link
Contributor

@brendar brendar commented May 8, 2023

Description

This PR fixes a race condition which caused protobuf marshaled schema data in _vt.schema_version rows to become corrupted when the ColumnType of query.Field pointers was modified between the time when Field message sizes were calculated and when Field message data was written to the buffer.

On Vitess versions <= 16, the race condition leads to invalid protobuf data being written to the schema_version table. When the schema historian tries to unmarshal the data it encounters an error, which breaks schema tracking on running tablets, and prevents newly started tablets from serving.

On Vitess main the race condition leads to a panic and the schema_version row is never written. This is due to the switch to using MarshalVT in #12525

I'm proposing to fix this by copying the fields returned from schema.Engine before modifying them. As far as I can tell, nothing relied on the side effect of setting ColumnType on the shared fields.

This PR also includes a defensive change to acquire the schema.Engine mutex while marshaling the schema to protobuf. This is not strictly necessary to fix the bug, but it could help avoid future race conditions or ones that haven't been discovered yet. I would be happy to remove it if anyone feels that it's unnecessary.

This bug was pretty disruptive for us, so I think the fix should be backported, but note that if it is backported, it may be desirable to change the call to MarshalVT back to proto.Marshal to match the existing code in released versions.

Related Issue(s)

Fixes #12981

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: Vttablet is writing invalid _vt.schema_version rows

7 participants